The impact of patentees assessment based on the heterogeneous patent innovation network

As a vital factor in technological innovation, patentee plays a significant role in the process of scientific and technological innovation, researching patentee has attracted the attention of experts and scholars. Previously, scholars have mainly quantified patent indicators or constructed homogeneous information networks to analyze patentees, but these methods cannot objectively measure the impact of patentees. Therefore, this study proposes a novel approach to assessing patentee impact based on a heterogeneous information network. The proposed method distinguishes the weight of different types of nodes using a weighted mechanism and extracts three types of fine-grained characteristics of network nodes. This approach results in the construction of a heterogeneous patent innovation network and the development of a new patentee impact assessment algorithm called CWAPN. Using Chinese green patents in the field of energy conservation and environmental protection as an example, experimental results show that the CWAPN algorithm can effectively assess the impact of patentees. Thereby identifying patentees who have made outstanding contributions to sustainable development in China.


Introduction
Patent as the result of innovation, it records the results and trajectory of invention, and patentee is the subject of patent rights.As the most active factor in patent activity, patentee plays a crucial role in the process of scientific and technological innovation.Consequently, analyzing patent quality and patentee impact can help identify important innovation subjects, core patent compositions, and competition and cooperation among technology subjects.Such analyses also provide objective data for science and technology management, policy formulation, and technology strategic layout decision.In previous studies, scholars evaluated the impact and innovation ability of patentees based on the number of patents or citations.The more patents a patentee invents, the greater their impact; and the more citations a patent receives, the stronger the patentee's innovation ability [1,2].In addition, patentometrics has borrowed from bibliometrics analysis methods such as the H index [3] and G index [4] to evaluate the impact of patentees.However, these methods excessively rely on the number of patent citations and can result in significant bias in patentee rankings, and it hinders the objective assessment of the impact of patentees.
To address the limitations of using patent and citation data to assess the impact of patentees, researchers have turned to building network structures.Patent data typically includes patent number, patentees, inventors, citation relationships, and patentee affiliation relationships.In previous studies, most scholars have utilized homogeneous information networks to analyze the impact of patentees.
For instance, constructing citation networks of patentees based to their citations and affiliations [5].Alternatively, it can construct a citation network for patents to calculate the importance score of patents before calculating the impact of patentees based on patent affiliation [6], or constructed collaborative networks to analyze influential patentees in technology based on patentee partnerships [7].However, "homogeneous information networks" refer to those networks that only contain one type of node and one type of relationship, which may ignore the impact of other types of entities and relationships in the dataset.In contrast, "heterogeneous information networks" encompass a broader range of information, incorporating diverse types of entities and various relationships among them.As a result, it has found extensive applications in analyzing complex social relationships between different entities [8,9].Heterogeneous information networks span various domains, including academic information networks, social media networks, and medical information networks, etc.Despite this, there has been limited research on patent innovation networks.Unlike homogeneous patent innovation networks that only incorporate homogeneous information networks related to individual patents, we have developed a heterogeneous patent innovation network encompassing two distinct entities: patents and patentees.To provide a more objective assessment of the impact of patentees, we have developed a heterogeneous patent innovation network using principles from complex network theory.This network efficiently mitigates the problems of self-citation encountered in previous homogeneous patent networks, as well as the mutual citation issues in patent citation networks.Furthermore, we utilized a weighted mechanism to differentiate the importance of nodes belonging to different categories and identify three crucial characteristics within the network's nodes.This approach results in a more sensible evaluation of the impact of patentees.Therefore, this study aims to construct a heterogeneous patent innovation network and employ the widely used PageRank algorithm to calculate the impact of entities in the network.While the PageRank algorithm provides a recursive method for analyzing the global structure of networks [10], heterogeneous information networks incorporate multiple types of entities and relationships, requiring a weighted mechanism to distinguish the importance of various entities.Moreover, node characteristics in networks play a significant role in evaluating impact.Therefore, this study attempts to explore node characteristics in the heterogeneous patent innovation network, including the credit allocation value of patentees, the H index of patentees, and the time decay factor of patents.
In the context of advocating green economic development, green technological innovations contribute to achieving the goals of enhancing enterprise competitiveness and promoting environmental protection for sustainable development.Green patents, as significant carriers of technological innovation, facilitate the analysis of these patents to unearth crucial innovators within this domain.This comprehensive understanding of our country's green innovation market conditions provides objective data support.Consequently, we collected and matched 539,297 green innovation patent records from the China National Intellectual Property Administration for the years 1985-2020.Using this dataset as our research target, we constructed a novel heterogeneous innovation network and evaluated the impact of patentees.The experimental results demonstrated that the CWAPN algorithm performs well.
The contributions of this study included the following three main parts.
(1) A novel approach to construct a heterogeneous patent innovation network is proposed, which is different from homogeneous information networks in previous studies.The constructed network contains rich structural and information, providing a more objective assessment approach for identifying influential patentees.(2) In the proposed heterogeneous patent innovation network, a weighted mechanism is utilized to distinguish the weights of various types of nodes.Moreover, three categories of node characteristics are extracted to enhance the PageRank algorithm.The CWAPN algorithm not only considers the interplay between different entities in heterogeneous information networks, but also considers the impact of network structure and node characteristics on the significance of patentees.(3) This algorithm not only provides a more accurate assessment of the impact of patentees, but also effectively addresses the recommendation problem in the patent innovation network.It is helpful for analyzing the position of patentees in the green innovation environment in China and providing decision-making support for companies or organizations to develop R&D strategies.
The subsequent sections of this paper are organized as follows.In Section 2, we present the literature review of this study.In Section 3, we describe the patentee assessment algorithm based on the heterogeneous patent innovation network.In Section 4, we introduce the data used in this study, baseline algorithms and relevant evaluation indicators.In Section 5, we show the results and analysis.Finally, in Section 6, we draw conclusions and future research prospects are discussed.

Literature review
In the context of patent innovation activities, patentees are the most active factor and play a crucial role in the innovation process.Consequently, evaluating the impact of patentees is not only an important research topic in the fields of patent analysis and technological innovation, but also holds significant practical implications for promoting technological innovation, managing technological risks, and enhancing technological competitiveness.In the existing literature, the algorithms designed for assessing the impact of patentees can be broadly categorized into two main categories: citation-based impact assessment algorithms and network structurebased impact assessment algorithms, these algorithms are summarized in Table 1.

Citation-based impact assessment algorithms
In previous studies, researchers have commonly used the number of patents as a proxy variable for innovation to measure the innovation capacity and impact of enterprises [11].However, the quality of patents is also crucial in assessing the impact of patentees, as the innovative ability of a patentee is often demonstrated by the patents to which they belong.Therefore, the number of citations a patent receives is considered an important metric of its quality and market value, with more citations indicating higher degrees of innovation.Consequently, scholars have proposed a series of assessing algorithms that use citation-based algorithm to evaluate patent quality.One such algorithm is the H index, which was originally proposed to quantify scholars' outcomes by synthesizing the quantity and quality of published papers [3].Guan and Gao subsequently introduced the H index from bibliometrics to patentometrics, with experimental results demonstrating its effectiveness in assessing technological significance and impact [12].Egghe proposed the G index to address the importance of highly cited literature that may be ignored by the H index [4].Some scholars have used these assessment algorithms, such as the H index and G index, to calculate patentees and assess their patent impact from several dimensions [13,14].All of these algorithms have successfully assessed patent impact and their computational processes are relatively simple and widely used.
However, when assessing the impact of patentees, relying solely on the number of citations for a patent can lead to biases for several reasons.Firstly, the number of citations for patents accumulates over time, and older patents have more time to accumulate citations compared to new inventions.Secondly, the frequency of patent citation in different technology fields may have different characteristics, and patentees in certain fields may be more inclined to cite the latest patent.To address these issues, researchers have proposed methods to weaken the influence of time or technological domain factors on innovation [15].Furthermore, the number of citations for each patent takes time to accumulate and may not reflect the current innovative influence of the patentee [16].Finally, some patentees attempt to increase the number of citations by various approaches, including self-citations or irrelevant citations from acquaintances such as friends, which can mislead patent impact scores and unfairly contribute to the patentee's innovation evaluation [17].Therefore, it is necessary to develop more sophisticated algorithms to assess the impact of patentees, taking into account the aforementioned factors.

Network structure-based impact assessment algorithms
Several assessment algorithms based on the number of patent citations have been developed to evaluate the impact of patentees, but they didn't consider the quality of the citations.In general, if a patent is cited by a highly quality patent or authoritative innovator, it should carry more weight in assessing its importance.To address this issue, some scholars have proposed using network structurebased algorithms to assess the impact of patentees [18][19][20].The core idea is that the importance of a node can be reflected by its position in the network, and if a node is at the core, it generally has a higher influence.Patent information datasets contain numerous entities and relationships, which are categorized into homogeneous and heterogeneous information networks based on the number of entities and relationships.
In constructing a homogeneous information network, only a single type of node and a single type of edge are considered.Therefore, when constructing a patentee information network, the patentee is the only node that exists, and depending on the patentee's affiliation, a cooperative network or a citation network can be constructed [7,21].Some scholars have constructed patent citation networks to calculate patent impact scores and obtain the impact of relevant patentees.To analyze the impact of a patentee using network scientific methods, the position of the patentee in the network is analyzed, with greater influence being associated with closer proximity to the center.This assessment can be achieved by measuring indicators of network centrality, including degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality.Additionally, common ranking algorithms such as PageRank and HITS are utilized to assess the impact of patentees.Taking the scholar impact assessment algorithms based on the network structure as an example [22], nodes represent scholars, while edges represent their collaboration relationships through co-authored papers.Scholars' influence in this academic collaboration network depends on their positions and connections within the network.The PageRank algorithm is employed to evaluate the "importance" of scholars in the academic network.A scholar is considered influential if they occupy a central position in the network and collaborate with other influential scholars.Using this algorithm, we can rank scholars based on their influence in the academic collaboration network and utilize these rankings to assess their academic reputation and contributions, thereby providing valuable information for academic decision-making.
Currently, most scholars analyzed patent data by constructing homogeneous information networks.However, patent datasets X. Liu and X. Li contain various types of entities, such as patent numbers, patentees, inventors, patent classification numbers, etc., and these entities are linked by different relationships, such as the reference relationship between patents, patent affiliation between patent and patentee, etc. Homogeneous information networks cannot fully present these information.In contrast, heterogeneous information networks can uncover different relationships between patent innovators and provide researchers with more comprehensive information [23].Thus, the study of heterogeneous information networks has emerged in the field of patent innovation.Tang et al. developed a universal theme-driven framework for analyzing and mining heterogeneous patent networks that comprise three object types: "companies, inventors, and technological content" [24].Du et al. constructed inventor-patent heterogeneous networks based on patented inventors' co-inventive relationships and invention affiliations and proposed the InventorRanking algorithm, the results showed that the algorithm can effectively identify highly influential inventors [25].Feng et al. effectively integrated patent features, such as patent documents, inventors, and applicants, to create a heterogeneous patent network with multidimensional features that can accurately identify technical innovation talent in patent datasets [26].In the context of heterogeneous academic networks, Kong et al. modified the PageRank algorithm by incorporating the time of publication of different papers to assess the impact of scholars in the paper-scholar network of random walking algorithms [27].Zhao et al. proposed the APR algorithm based on a heterogeneous author-paper network to assess the impact of scholars, defining the author's impact as the smooth distribution of the Markov process [22].However, none of the aforementioned studies established a heterogeneous patent innovation network from the perspective of patentees, combined with patent citation relationships.Furthermore, the node features of the network contain essential values that can be mined and incorporated into the algorithm to enable random walking based on partial restarts, facilitating the identification of the most influential patentees in the field of patent innovation within heterogeneous information networks.
Therefore, we propose to construct a novel heterogeneous patent innovation network to assess the impact of patentees.This approach effectively avoids the severe bias issues caused by traditional quantified metrics such as H index or G index, and also mitigates the problem of self-citation or mutual citation in the constructed homogeneous information network, making the evaluation of patentee's influence more objective.Additionally, we not only take into account the mutual influence among different entities in the network but also consider the impact of network structure and node characteristics on the importance of patentees, resulting in more reasonable evaluation outcomes.

Patentee assessment algorithm based on the heterogeneous patent innovation network
In recent years, the explosive growth of patent information in China has provided a data basis for studying technological innovation.The evaluation of innovation impact is generally transformed into a statistical analysis or importance ranking problem.This study defines the impact of patentees as the probability of a patentee being randomly accessed in a heterogeneous patent innovation network.Firstly, a heterogeneous patent innovation network is constructed, and a weighted mechanism is used to distinguish the weight of different types of nodes.Additionally, the fine-grained information of network nodes must be considered, including the credit allocation value of patentees, the H index of patentees, and the time decay factor of patents.Finally, the PageRank algorithm is improved to obtain the impact assessment of patentees in the heterogeneous patent innovation network.X. Liu and X. Li

System framework
This study proposes an algorithm for assessing the impact of patentees based on a heterogeneous patent innovation network.The system framework is shown in Fig. 1.
The system framework of the CWAPN algorithm is composed of four parts as follows.
(1) Constructing a heterogeneous patent innovation network with nodes consisting of patents and patentees.In this network, nodes p 1 , p 2 and p 3 represent patents, while a 1 and a 2 represent patentees.There are two types of relationships in the network, represented by blue edges connecting patents, denoting citation relationships, and red edges connecting patentees, denoting patent affiliation relationships.(2) Constructing a weighted heterogeneous patent innovation network.Since the constructed heterogeneous patent innovation network contains both patent and patentee nodes, it is necessary to use a weighted mechanism to distinguish the weight of different types of nodes.In this study, the weight of jumping from a patent to its reference patent is set as β, while the weight of jumping to its patentee is (1-β), and if there are more than one patents or patentees, the weight is averaged.As can be seen in Fig. 1, patent p 1 has two reference patents p 2 and p 3 , so its weight to both is β/2.Additionally, patent p 1 has only one patentee a 1 , so the weight is (1-β).Since the target node for the patentee can only be a patent, its weight remains at 1. (3) Calculating three types of network node characteristic values, including the value of the patentee's credit allocation, the patentee's H index, and the time decay factor of the patent.As shown in Fig. 1, the credit values of the two patentees a 1 and a 2 in patent p 3 are first calculated, where the credit value of patentee a 1 is 0.66, and the patentee a 2 is 0.34.Secondly, the H index of the patentee is calculated, such as the H index of patentee a 1 is 5. Finally, the time decay factor of the patent is calculated based on the date of application and the date of assessment, and from which f(age) is calculated.(4) Using a biased restart random walk model to evaluate the impact of patentees.The above three types of node features are added to the PageRank algorithm to perform the random walk process to calculate each patentee's impact score.Finally, patentees are ranked according to their scores, as shown in Fig. 1, the patentees are ranked as a 1 and a 2 .As the ranking of a 1 is higher than that of a 2 , it indicates that the impact of patentee a 1 is higher than that of patentee a 2 .

Constructing a heterogeneous patent innovation network
Definition: (heterogeneous patent innovation network) Given a network G = (V, E), where V denotes the nodes of the network including patents and patentees, and E denotes the edges of the network including citation relationships between patents, affiliation relationships between patents and patentees.Given a set of patent p = {p 1 , p 2 , ..., p m } and a set of patentee a = {a 1 ,a 2 ,..., a n }, let E PP denotes the citation links between patents, and E PA denotes the affiliation links between patents and patentees.Therefore, the heterogeneous patent innovation network is represented as a graph G = (a ∪p,E PP ∪E PA ), and we hope to get a vector a from graph G that can reflect the importance of patentees.
If the heterogeneous patent innovation network contains m patents and n patentees, the graph can be represented by a binary (m +n) × (m +n) adjacency matrix A: Where A PP is the citation matrix between patents, A PA and A AP represent patent-patentee relationships where A PA = A T AP , and a ij = 1 if node i points to node j.It is significant to note that there is no direct relationship between patentees in the graph.
Fig. 2 gives a structural example of a heterogeneous patent innovation network, which is distinct from other networks such as patent-subject networks [24][25][26] and academic paper-author networks [27][28][29].In our proposed network, there are no direct links between patentees.The is due to several reasons: (1) Relationships between patent subjects can be derived from multiple sources, such as cooperative joint patents and citation relationships between patentees.These derivative relationships often result in loss of information during graph transformation, and the number of derived relationships can increase exponentially.For example, if a patentee has applied for m patents, each citing an average of n patents and containing k co-patentees, then the number of citation relationships Fig. 2. Example of the heterogeneous patent innovation network.
X. Liu and X. Li between patentees would be increase drastically to m × n × k. (2) Direct links between patentees can lead to social network dominance in the ranking system, whereby a small group of co-patentees of a patent forms and receives more traffic, resulting in a higher ranking of group members.(3) The ranking of the patentee should be determined by patents, not by the patentee's social relationships.Therefore, we exclude the direct links between co-patentees in constructing the heterogeneous patent innovation network.
Next, we analyze the constructed heterogeneous patent innovation network.To achieve this, we define the graph adjacency matrix A in Eq. ( 1), which is then transformed into a column random matrix B while ensuring that the sum of each column is 1.By doing so, we can consider the network as a Markov chain process and represent the impact of the patentee as a smooth distribution of the Markov process.In order to achieve the eventual stabilization of the Markov chain, we introduce a new stochastic matrix M and add a virtual link with equal but small transition probability to each pair of nodes: Where e denotes a unit vector with a n-dimension, α is the damping factor, n denotes the dimensions of a matrix.Therefore, the Markov process represented by stochastic matrix M is guaranteed to be strongly connected and non-cyclical, and its stable distribution is guaranteed as well.And the ranking of patentee or patent is the main eigenvector r of matrix M, which can be calculated by Eq. ( 3).
We use Fig. 2 as an example of the heterogeneous patent innovation network consisting of seven patents and five patentees, including citations between patents and affiliations between patents and patentees.To analyze this network, we transform it into a graph adjacency matrix A, then transform A into a column random matrix B, and add virtual contiguity to create a new random matrix M. With a damping factor α = 0.85, the importance of the patentee and patent is determined by the main eigenvector r of matrix M.
The results in Fig. 3 indicate that patentee a 2 has greater influence as it has invented three patents and one of its patent contains a citation relationship.Moreover, patentee a 1 and a 3 are contained only one patent, but since patentee a 3 has a patent p 2 is cited once, hence it is more important than patentee a 1 .Fig. 3. Solutions process of the heterogeneous patent innovation network (Fig. 2 as an example).

Construct a weighted heterogeneous patent innovation network
Although constructing a heterogeneous patent innovation network and assessing the impact of patentees using the common PageRank algorithm is feasible, it lacks the ability to distinguish between the types of nodes in the network, which is an unreasonable assumption.A problem with long reference citations for patents in the constructed network was discovered, as shown in Fig. 4. In this example, Fig. 4(a) shows that patent p 1 is patented by a 1 and contains seven references; while Fig. 4(b) shows that patent p 2 is patented by a 2 and contains only two references.Consequently, a 1 receives a weight of 1/8 from p 1 , and a 2 receives a weight of 1/3 from p 2 .If p 1 and p 2 are of equal importance, then a 2 would be deemed more important than a 1 .However, for patents that require more patent references for innovation, the corresponding patentees would have a very small weight.To avoid this long reference problem, a weighted mechanism is used to distinguish the weight of the patent and patentee.Specifically, we set the jump rate from the patent to its reference patent as β, and the jump rate to the relevant patentee as (1 − β).If there are more than one reference patent or patentee, the jump rate is divided equally.Since the connection node of the patentee can only be a patent, the jump rate of the patentee to patent is not considered.Consequently, we obtain a new column stochastic matrix B, in which the sum of the weights of each column is still 1.
Fig. 4. Problem with long reference in the heterogeneous patent innovation network.

Calculating node characteristics in the heterogeneous patent innovation network
The heterogeneous patent innovation network comprises nodes of different types, each with its own unique properties.To enhance the accuracy of patentee impact assessment, we also take into account the specific information of nodes in the network, such as the value of patentee's credit allocation, the patentee's H index, and the patent's time decay factor.Credit allocation value of the patentee reflects its reputation and significance in the academic or industry domain [30,31].The H index is a scientometric index used to assess the personal contribution and impact of scientists [32,33].The time decay factor of a patent considers the age factor of the patent.Over time, the impact of a patent typically diminishes gradually [34,35].Therefore, these three types of characteristics play a significant role in assessing the impact of patentees.Credit allocation value and H index reflect the reputation and contributions of patentees in the academic or industry domains, while the time decay factor considers the timeliness of patents.By comprehensively considering these characteristics, a more accurate assessment of the impact of patentees can be achieved, thereby providing strong support for relevant decision-making and research.
(1) Credit allocation value of the patentee.Patentees can collaborate to combine ideas and technologies from different fields, leading to the creation of more innovative and high-quality patents.In the process of inventing a patent, multiple patentees may work together, each with their own contribution.It is necessary to allocate the value of the contribution of the different patentees to the same patent.
Previous studies have commonly used simple methods to allocate credit value to patentees.For patents with a single patentee, a credit value of 1 is assigned.However, for patents with multiple patentees, the fractional counting method is used to distribute the credit value equally among N patentees, where each patentee receives a credit value of 1/N [30].While this method is criticized for its unfair distribution.To solve this problem, scholars have explored various methods such as geometric, harmonic, arithmetic, and network-based methods, taking into account the order of patentees.For instance, the harmonic method allocates a score to all patentees of a patent, where the credit value of patentee i is proportional to the inverse of their ranking in the list.Thus, the higher the ranking of the patentee, the higher their credit value.In a patent with three patentees in the order of A, B, and C, the credit value of patentee A is 1/(1 + 1/2 + 1/3) = 6/11, and the credit values of patentees B and C are 3/11 and 2/11, respectively.This method considers the actual contribution of the patentee based on their order of authorship and has some rationality.
(2) H index value of the patentee.The H index is an evaluation method based on citation and is widely used in academia to assess an author's research impact [3].In this study, we propose to incorporate the H index into the PageRank algorithm to increase the probability of random access for patentees and enhance their influence.The H index of a patentee refers to the number of their patents that have been cited at least H times, with the assumption that patentees with higher H index have greater influence.Hence, we employ the H index calculation method to analyze the H index of patentees, which constitutes a crucial factor in quantifying the impact of patentees.(3) Time decay factor of the patent.This study revealed that newer patents were more likely to be cited and that recent technological innovations received greater attention.Fig. 5 shows the total number of citations for patent reference intervals in the green innovation patent dataset created in this paper.The total number of citations per patent increased over time, with the number of annual citations for a patent increases over the 0-1 year following the patent application, followed by a yearly decrease.Thus, to accurately measure the impact of a patent, more weight should be given to recent patents, with consideration of the timing of the patent application.We introduce the time decay factor of a patent in the calculation of its impact score, taking into account the evolution of its value over time.The effect of the citation timeliness of a patent on its value can be transmitted through its reference relationships.
To calculate the time decay factor of a patent, we utilize the application year and evaluation year of the patent.In this study, we use the time decay function f(age i ) as shown in Eq. ( 5), where T a denotes the year of assessment, T p denotes the year of patent application, and ρ is the parameter that controls the extent of time decay.Eq. ( 5) is a monotonically decreasing function, indicating that the time decay factor of a patent decreases with the length of time since its publication.This approach allows for a more accurate assessment of the impact of a patent by considering its timeliness in relation to the evaluation period.

Biased restart random walk model to evaluate the impact of patentees
To incorporate fine-grained information of nodes in the network, we introduce the three types of node characteristics obtained above into PageRank algorithm, and implement the random walk process in the weighted heterogeneous patent innovation network.The improved algorithm (name CWAPN) not only distinguishes different types of nodes using a weighted mechanism and jump rate, but also weights the intrinsic properties of nodes.This approach more rationally simulates the assessment process of nodes in heterogeneous patent innovation networks.The CWAPN algorithm calculates the random walking score of patentees based on the following two assumptions.
X. Liu and X. Li (1) The impact score of a patentee is affected by the score of all patents he has applied for.This indicates that the impact of a patentee in the field is manifested through the quality of patents invented.(2) The impact score of a patent is not only affected by the score of its belonging patentee, but also by the reference patents that cited it.For instance, patentees with a higher reputation are more inclined to invent high-quality patents.Moreover, if a patent is cited by another high-quality patent, its impact score should also be higher, indicating that the patent has significant technological innovation.
Therefore, the impact of patentees and patents in the constructed network mutually reinforces each other.Supposed the total number of patentees is N a and the total number of patents is N p , initial scores of patentees and patents are both set to 1/ ( N a + N p ) .The random walk model for calculating the impact of patentees in the heterogeneous patent innovation network is shown below: (1) Update the PageRank score of the patentee by Eq. (6).

PR(a
where PR(a i ) is the PageRank score obtained by a i in the network, hindex(a i ) is the H index of patentee a i , α is the damping factor, (1 − β) denotes the weight of the jump rate from patent to patentee, PR ) is the PageRank score obtained by the patent p j in the network, credit(a i ) is the credit allocation value of patentee a i in patent p j , and m denotes the total number of patents owned by patentee a i .Eq. ( 6) is composed of two parts, the first part is the restart probability value of patentee a i , which reflects that the patentee's impact is greater with a higher H index; the second part is the weighted probability of jumping from the patentee's neighbor (patent) to the patentee, which takes into account the patentee's credit allocation in each patent.The higher the ranking of the patentee, the higher their credit rating.
(2) Update the PageRank score of the patent by Eq. (7).
Where f ( age j ) is the time decay factor of patent p j , N p is the total number of patents in the network, C(a i ) is the total number of patents owned by patentee a i , n denotes the number of patentees included in p j , w is the number of citations in p j , and L(p i ) is the total number of reference patents for p j .Eq. ( 7) is composed of three parts, the first part uses the decay factor of patent's application time to weight the restart probability of p j .If the patent is relatively recent, the value of f ( age j ) is higher, indicating that the patentee is more likely to cite a recently filed patent in their patent citations; the second part indicates the probability of the patentee a i reaching the patent p j , and the impact score of the patent is influenced by the impact score of the associated patentee; the third part represents the probability of jumping from neighboring patents to the target patent, and its impact score is influenced by the impact scores of its reference patent.
(3)Repeat the above two steps until the PageRank scores of all patentees and the PageRank values of all patents converge.The convergence condition is that the sum of the difference values obtained from the two iterations of all patents and patentees is less than a fixed threshold value.Therefore, the pseudo-code for the implementation of the CWAPN algorithm proposed is shown in Algorithm 1, where the input G AP denotes the constructed heterogeneous patent innovation network, init assignee(A) is the initial PageRank value of the patentee, init patent(P) is the initial PageRank value of the patent, credit(P) is the credit allocation value of The impact score of each patentee is calculated using Algorithm 1 and then sorted in descending order based on the score values to obtain the actual impact ranking of each patentee.

Data and methods
This section presents the description of data and methods used in our study.We constructed a dataset of green innovation patents in China and provided a comprehensive explanation of the baseline algorithms relevant to patentee impact.To assess the impact of patentee, we employed NDCG and RI as evaluation indicators, providing a research foundation for investigating the performance of the CWAPN algorithm on a real-world dataset.

Green innovation patent dataset
The granted patents used in this study were obtained from the website of the China National Intellectual Property Administration (CNIPA). 1 We specifically selected granted patents instead of other types of patents, as previous researches have shown that granted patents are a better indicator of the quality and innovative capacity of patents [36,37].Then, the "green list of international patent classification 2 " published by the World Intellectual Property Organization (WIPO) was used to identify green granted patents in China.Furthermore, patent citation information was gathered from the Google Patents website 3 using the patent granted number.Therefore, the green innovation patent dataset for this study was obtained, with a time dimension of 1985-2020 and a total number of 539,297 patents.
Fig. 6 illustrates the cumulative annual number of green granted patents in China.It indicates that the quantity of green patents was relatively small before the year 2000, with a sharp exponential increase observed from 2001 to 2018, which demonstrates the growing interest in green technological innovation.Therefore, by examining Chinese green patents, important and influential patentee information can be extracted.This information aids in a comprehensive understanding of the market competition in the green innovation field, and provides objective data support for green innovation management, policy formulation, and strategic decision-making.Ultimately, this contribution can help achieve green development and double carbon targets in China.
To ensure high-quality research data, patent datasets must undergo cleaning and preprocessing to remove noise data, such as identifying and consoling unreasonable patent citation relationships and resolving previous name inconsistencies in the patentee list.After these steps, obtained the information contained in the constructed heterogeneous patent innovation network, which is presented in Table 2.
X. Liu and X. Li (excluding self-citations) against patentees over the past three years as a base fact to verify the algorithm's performance.Consistent with prior work by Chen et al. [38], we employ the PageRank algorithm to assess node importance and set the damping factor α = 0.5 for citation networks.We utilize a random walk model and execute 1000 iterations to ensure convergence, setting the stopping threshold at 1e-8.To validate the proposed algorithm, we compare it against several baseline algorithms.
(1) G: This algorithm represents the G index of a patentee, which is a derivative of the H index widely used to assess the quantity and quality academic output among researchers.The G index has been adopted by many researchers to evaluate the impact of patent fields [13].
(2) PN_SPR: This algorithm represents the constructed of a patent citation network based on the relationships between patents, followed by the application of the PageRank algorithm to calculate importance scores for each patent.The scores are then summed up to obtain the influence score of the corresponding patentees.(3) TAPRW: This algorithm is employed to assess the impact of scholars in an academic network.Firstly, it calculates the PageRank score of the paper's time perception in the citation network, and utilizes it as the initial value for random walk in the heterogeneous author-paper network.The constructed heterogeneous patent innovation network is similar to this network, this algorithm is also used as a baseline algorithm.(4) FutRank: FutureRank is a novel algorithm that leverages both the connection structure between nodes and the time series information of nodes in a network to predict their influence [39].This algorithm first applies the PageRank algorithm to the patent citation network, followed by the HITS algorithm on the patent-patentee network, and combines their results.Notably, it introduces a function that incorporates the patent application time, as it is observed that patentees are more likely to cite recently filed patents.This study utilizes the FutureRank algorithm to analyze the influence of patentees, and it is presented as a comparative algorithm in this study.(5) WAPN: This algorithm represents the weighted heterogeneous patent innovation network constructed in this study.The weighted mechanism is employed to distinguish different types of nodes, and the problem of long reference citation in patents is solved to obtain a relatively objective assessment of patentee impact.

Evaluation indicators
Assessing the impact of patentees can help in discovering key technologies and innovative subjects for core patents in the technology field.It can also reveal linkages between patentees and provide decision-making support for promoting R&D cooperation, technology research, and development strategies.Therefore, it is crucial to identify high-ranked patentees in the impact assessment algorithm of technological innovation.
However, most previous literature ranking studies face a common challenge of effectively evaluating the ranking algorithm's results due to the lack of ground-truth data.Citation count is widely accepted as a measure to determine the impact of researchers or  patentees.A high impact score indicates a certain level of prestige in the relevant technology field.Specifically, the number of citations (excluding self-citations) received by a patentee's invention in a future period is used to reflect the patentee's impact.In this study, we divided the dataset into two parts: the ranking part and the evaluation part.We use the Chinese green patent dataset spanning from 1985 to 2017 to implement these ranking algorithms, and the sum of all citations of the patentee from 2018 to 2020 was used as the patentee's impact, which served as the ground-truth ranks for the FutCit algorithm.By comparing the CWAPN algorithm with baseline algorithms, NDCG and RI indicators are used to evaluate the precision and ranking performance of the algorithm's performance.The reasons why NDCG is suitable for evaluating algorithms of assessing the impact of network nodes are as follows: (1) It considers the quality of rankings and measures the similarity between algorithmic rankings and true influence rankings.(2) It emphasizes high-influence nodes by assigning greater weight to nodes ranked higher, effectively capturing nodes with significant influence.(3) NDCG exhibits good interpretability.With values ranging from 0 to 1, a higher value indicates that the algorithmic rankings are closer to the true influence rankings, making it easy to understand and interpret the evaluation results.In summary, NDCG has the advantages of considering ranking quality, emphasizing high-influence nodes, and being interpretable, making it highly suitable for evaluating algorithms for assessing the influence of network nodes.Additionally, RI is applicable for assessing the precision of ranking algorithms in recommendation systems.This indicator is used to measure the confidence or strength of the recommendations provided by the system.In recommendation systems, recommendations are often ranked based on their intensity scores, which reflect the system's confidence in the user's potential interest in the recommended items.Higher intensity scores are assigned to items the system is more certain about, leading to their higher positions in the ranked list, as the system believes users are more likely to be interested in them.Consequently, RI not only considers ranking precision but also emphasizes recommendations with higher confidence by giving them greater weight.Therefore, using this indicator to assess the precision of ranking algorithms is also reasonable.The specific evaluation steps are as follows: (1) The ranking list of patentees obtained by the FutCit algorithm is treated as the ground-truth.(2) Different node impact assessment algorithms are applied to calculate the importance scores of patentees.(3) Nodes are sorted based on their importance scores, resulting in the algorithm's node ranking list.(4) The algorithm's node ranking list is compared with the ground-truth to compute the NDCG and RI scores, thus facilitating a comprehensive performance analysis of different influence evaluation algorithms.
(1) Normalized discounted cumulative gain (NDCG) is commonly used as a measure of ranking quality.It considers not only the relevance of ranking lists but also the location of ranking results.The NDCG incorporates two crucial ideas into its assessment: ① It rates the relevance of a patentee's list by ranking it, where a higher ranking indicates greater importance of the patentee.② It assigns a higher score to a highly correlated result occurring in a higher ranking position, just as show in Eq. (8).
where rel i denotes the relevance of the i-th result; REL denotes the ground-truth list length in top-k after ranking the relevance in descending order; DCG@k denotes the top-k nodes in the ranking, where the relevance of rel i is considered node by node and multiplied by a discounted value 1/log 2 (i + 1); IDCG@k is the maximum DCG value in the ideal case, which normalizes the value of DCG@k.In this study, the value of rel i is set to 1 if the node is in the top-k of the ground-truth list.Otherwise, it is set to 0. Therefore, The NDCG indicator not only considers the relevance of results, but also incorporates the position of the results within the ranking list, which makes it more accurate in evaluating the quality of rankings.
The result of NDCG ranges from 0 to 1.When NDCG = 0, it means that the algorithm's ranking result is completely inconsistent with the ground-truth ranking, indicating poor algorithm performance.On the other hand, when NDCG = 1, it indicates that the algorithm's ranking result perfectly matches the ground-truth ranking, demonstrating optimal algorithm performance.By observing the NDCG scores, we can intuitively understand the performance of the ranking algorithm.Higher NDCG scores imply that the algorithm can accurately predict influential patentees.Conversely, lower NDCG scores necessitate further optimization of the algorithm to enhance the precision and quality of the ranking.
(2) Recommendation intensity (RI).The RI indicator relies on two fundamental intuitions: given two sets of ranking results R1 and R2 on their top-k list, the superiority of R1 over R2 can be inferred if ① R1 contains a large number of patentees that match the groundtruth ranking, and ② the patentees that match the ground-truth ranking in R1 must occupy higher positions in the top-k list as compared to R2 [40].If R is the list of top-k patentees returned by a ranking algorithm, and L is the list of ground-truth entities.For each patentee P i in R, with its corresponding ranked order o r , the RI of P i at k can be formally defined as Eq.(9).
This implies that if the patentee P i is present in the top-k ground-truth list R and is ranked higher (i.e., has a smaller o r value), its corresponding RI value will be higher.By aggregating the RI value of each patentee in the list R, the overall RI of the list at k can be defined as Eq.(10).
Recommendation intensity shares some similarity with precision at k in the context of information retrieval.Specifically, if we X. Liu and X. Li consider the top-k recommendations in list R to be unordered and divide the value of RI, denoted by RI(P i )@k at k, then it reduces to precision at k.The result of RI are associated with the chosen value of k.By fixing the value of k, a comparison of RI scores obtained from ranking algorithms can effectively assess their performance.A higher RI score indicates a stronger recommendation strength of the algorithm, signifying higher confidence in the recommended results.Consequently, it facilitates finding influential patentees.

Results and analysis
Table 3 displays the top 20 patentees of each ranking algorithm in the Chinese green patent dataset, where "Patentee" denotes the name of the patentee in English, and "Rank" signifies the attained ground-truth ranking.Meanwhile, we computed the Pearson correlation coefficients for the rankings of the top 20 patentees in the ground truth list compared to each ranking algorithm.
According to Table 3, among the top 20 ranked patentees in the Chinese green patent dataset, 16 are universities, 3 are enterprises and 1 is a research institute.This illustrates that the primary contributors to green technological innovation in China are reputable universities, authoritative enterprises and research institutions.Additionally, a comparison between the ground-truth rankings and the rankings obtained by various algorithms reveals that the rankings by CWAPN algorithm closely aligns with the ground-truth rankings.This suggests that the proposed algorithm can more accurately identify influential patentees.By examining the Pearson correlation coefficients of the rankings for the top 20 patentees derived from the ground truth list and various algorithms, it becomes evident that the ranking produced by the CWAPN algorithm closely aligns with the ground truth rankings, which has a score of 0.76.This finding provides further substantiation for the exceptional performance of the CWAPN algorithm.Moreover, in order to better demonstrate the superiority of the CWAPN algorithm, the top 100 ranking results are presented in box plots, as shown in Fig. 7.
From the above box plots, we can see the lowest, highest, and mid-range rankings of the top of 100 patentees, with "+" representing outliers in the ranking positions.It can be seen that the ranking results of the G, PN_SPR and WAPN algorithms are relatively scattered, and the rankings of TAPRW, FutRank and CWAPN algorithms are relatively concentrated.As the ground truth ranking results range from 1 to 100, the closer the results of different ranking algorithms are, the smaller the variability of the data, indicating a lower level of data dispersion.This can be interpreted as the data points being relatively close in the statistical distribution and concentrated with a narrow range.When the box plot displays more concentrated results, it generally indicates: (1) relative stability in data distribution without many outliers, (2) higher consistency among different samples or observations with small differences, and (3) higher reliability, as the data concentration implies more credible results and greater accuracy in the algorithm's outcomes.Therefore, our analysis using the box plot reveals that the CWAPN algorithm produces ranking results that are highly concentrated, with the majority falling within the top 100.This further confirms the algorithm's ranking closely aligns with the ground truth ranking.And from the upper quartile, maximum values of the box plot, the proposed CWAPN algorithm also has a smaller ranking than TAPRW and FutRank algorithm, which shows that CWAPN algorithm is more effective than other ranking algorithms.rankings.Moreover, the CWAPN algorithm exhibits the highest average NDCG of the top 300 rankings.It is 25.42 % higher than the G index, 6.34 % higher than the PN_SPR algorithm and 2.90 % higher than the FutRank algorithm.These results further demonstrate the CWAPN algorithm's ability to accurately identify influential patentees.

RI
The RI results of top k in the heterogeneous patent innovation network for patentees are shown in Table 5.The results indicate that the performance of the CWAPN algorithm still outperforms other baseline algorithms.Moreover, this algorithm exhibits the highest average RI of the top 300 rankings.It is 26.33 % higher than the G index, 6.18 % higher than the PN_SPR algorithm and 2.27 % higher than the FutRank algorithm.This finding suggests that the rankings generated by the CWAPN algorithm are more accurate reflections of the ground-truth rankings of patentee influence.
In summary, the evaluation indicators of both NDCG and RI demonstrate that the performance of the CWAPN algorithm outperforms that of other baseline algorithms.This suggests that employing the CWAPN algorithm proposed in this paper not only produces more accurate assessments of patentee influence, but also effectively resolves recommendation issues in patent innovation networks.The algorithm facilitates the analysis of patentee's position in green innovation in China and provides valuable data support for enterprises and institutions to formulate technology research and development strategies.
To validate the algorithm's superiority, we conducted a robustness analysis of patents in the field of nuclear engineering technology, and the results are presented in Table 6.The patent dataset utilized in this experiment contains 7299 patents, with 410 relevant patentees used for prediction.The top 10 % of the data is the most valuable, hence we performed an experimental analysis on the top 10, 20, 30, and 40 ranked patentees.The results indicate that the outcomes achieved by our proposed CWAPN algorithm were consistently among the best.Therefore, we can assert with confidence that our research conclusions will remain robust even if we were to adopt patent datasets from other fields.

Conclusion
This study aims to construct a heterogeneous patent innovation network by combining the heterogeneity of network structure with information diversification.The main contribution of this work is the development of a weighted mechanism that distinguishes the weight of different types of nodes in the network and the excavation of three types of fine-grained information for network nodes, this method can effectively suppress the bias problem in the citation-based impact assessment algorithms, making the ranking results more reasonable.The study employs Chinese green patents from 1985 to 2020 as the research target, and the experimental results show that the proposed CWAPN algorithm generates a ranking of patentees that is much closer to the actual ranking, demonstrating the algorithm's validity.Hence, the implementation of the above assessment algorithms is advantageous in identifying the crucial technologies and primary patentees who have made outstanding contributions to sustainable development in China.This holds significant practical implications and applied value towards directing and evaluating the green innovation competence of businesses or organizations.
Some limitations exist in this study: (1) Only patent and patentee information is considered in the construction of the heterogeneous patent innovation network, while the impact of inventors and technology categories on the importance of patentees is not taken into account.In the future, additional information can be added to the heterogeneous information network to construct a more comprehensive network [41].The reputation, expertise, and experience of patent inventors directly influence the quality and innovative capability of patents, and also indirectly impact the level of influence of patentees.Inventors with high reputation and respect are often capable of creating high-quality patents, which in turn, contribute to increased citation frequency and enhance the influence of patentees.Patents in different technology categories may vary in their degree of innovation; in highly innovative technology fields, patentees are more likely to wield greater influence as these patents typically represent cutting-edge technologies and significant scientific advancements.Therefore, these patent-related factors play crucial roles in assessing the impact of patentees.However, the incorporation of such information would render the heterogeneous innovation network and impact assessment model more complex, necessitating a higher level of interpretability for the obtained results.(2) Node attribute information could influence the impact of nodes in a network.This paper only explores three significant attributes that affect patents and patentees.However, these nodes also encompass other attributes, such as the type of patentee (enterprises, universities, individuals, etc.), the average time of patent citation.These attribute values may have an impact on the results of assessing the impact of patentees.Therefore, in future research, we need to investigate more crucial node attribute information and analyze the implications of evaluating the impact of patentees based on these attributes through experimentation.(3) The most direct result of assessing the impact of patentees is to identify those with authoritative patentees.However, defining the impact of patentees is a complex task, and there is currently no unified concept.Therefore, there is a need for a standardized approach to measure the authority of patentees.In future research, it is crucial to comprehensively define authoritative patentees based on various factors, which will not only contribute to the study of influence but also benefit the entire academic community.
Defining the impact of patentee requires considering multiple important features and indicators.The impact of patentees can be assessed from the following perspectives: (1) Patent quantity.The more patents a patentee possesses, the greater their influence in the technological field.(2) Citation count.The number of times a patent is cited by other patents reflects its importance and impact.A higher citation count indicates greater influence and innovative of the patentee.(3) Citation quality.A higher number of citations from other high-impact patents signifies the patent's greater significance in the technological field.(4) Patent collaboration.Whether a patentee collaborates with other authoritative patentees also reflects their authority in the field.(5) Technological field.The technological field in which the patentee is involved can also affect their influence.In some fields, patentees may receive more attention and recognition.By considering the above features and indicators, the impact of patentees can be defined using both quantitative and qualitative methods.For instance, combining patent citation data, patent collaboration network analysis, and patent quality assessment can effectively measure the impact of patentees, making it clearer and more consistent.This enhances the persuasiveness of the research findings in this paper.

Fig. 5 .
Fig. 5. Patent citation gap and the total number.

Fig. 6 .
Fig. 6.Cumulative annual number of green granted patents in China.(For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Fig. 7 .
Fig. 7. Box plots of the ranking positions for top 100 patentees.

Table 1
Summary of impact assessment algorithms.
FutRankConsidering network structure and time information Zhao et al. (2019) APR A heterogeneous author-paper network, ignore node's characteristics X. Liu and X. Li

Table 2
Summary of data characteristics of the heterogeneous patent innovation network.

Table 4
represents the NDCG results for patentees ranked in the top k positions, with k equals to 10, 20, 50, 100, 200, 300, and the average DNCG within the top 300 were obtained using assessment algorithms in the heterogeneous patent innovation network.The table clearly illustrates that our proposed CWAPN algorithm significantly outperforms all baseline algorithm at different top k

Table 3
Ranking of top 20 patentees in the Chinese green patent dataset.

Table 4
Top k of NDCG in the heterogeneous patent innovation network (%).

Table 5
Top k of RI in the heterogeneous patent innovation network.
X. Liu and X.Li